随着机器学习模型在自动驾驶汽车(AV)的运动预测系统上变得越来越普遍,至关重要的是,我们必须确保模型预测是安全可靠的。但是,详尽地收集和标记充分测试稀有和挑战性场景的长尾所需的数据是困难且昂贵的。在这项工作中,我们构建了一个新的基准测试,用于通过将扰动应用于现有数据来评估和改善模型鲁棒性。具体而言,我们进行了广泛的标签努力,以识别因果因素,或者在Waymo Open Motion数据集(WOMD)中以任何方式影响人类驾驶员行为的代理,我们使用这些标签来通过删除非carusal剂来扰动数据从现场。然后,我们在我们提出的基准上评估了一套各种最先进的深度学习模型体系结构,并发现所有模型在扰动下均显示出很大的变化。在非作业扰动下,我们观察到$ 25 $ - $ 38 \%$ $相对变化,而与原始相比。然后,我们研究以提高模型鲁棒性的技术,包括增加训练数据集的大小以及使用靶向数据增强,这些数据增加在整个培训过程中都放下了代理。我们计划提供因果代理标签作为womd的附加属性,并释放稳健性基准,以帮助社区建立更可靠和安全的深度学习模型,以进行运动预测。
translated by 谷歌翻译
最大化模型准确性的常规配方是(1)具有各种超参数的多个模型,以及(2)选择在固定验证集中表现最佳的单个模型,从而丢弃其余部分。在本文中,我们在微调大型预训练的模型的背景下重新审视了该过程的第二步,其中微调模型通常位于单个低误差盆地中。我们表明,平均多种模型的权重以不同的超参数配置进行了微调通常提高准确性和鲁棒性。与传统的合奏不同,我们可能会平均许多模型,而不会产生任何其他推理或记忆成本 - 我们将结果称为“模型汤”。当微调大型预训练的模型,例如夹子,Align和VIT-G在JFT上预先训练的VIT-G时,我们的汤食谱可为ImageNet上的超参数扫描中的最佳模型提供显着改进。所得的VIT-G模型在Imagenet上达到90.94%的TOP-1准确性,实现了新的最新状态。此外,我们表明,模型汤方法扩展到多个图像分类和自然语言处理任务,改善分发性能,并改善新下游任务的零局部性。最后,我们通过分析将权重平衡和与logit浓度的性能相似与预测的损失和信心的平坦度联系起来,并经过经验验证这种关系。代码可从https://github.com/mlfoundations/model-soups获得。
translated by 谷歌翻译
尽管他们能够代表高度表现力的功能,但深度学习模型似乎找到了简单的解决方案,这些解决方案令人惊讶地概括了。光谱偏见 - 神经网络优先学习低频功能的趋势 - 是对此现象的一种可能解释,但是到目前为止,在理论模型和简化实验中,主要观察到了光谱偏差。在这项工作中,我们提出了用于测量CIFAR-10和Imagenet上现代图像分类网络中光谱偏差的方法。我们发现这些网络确实表现出光谱偏差,并且提高CIFAR-10测试准确性的干预措施往往会产生学到的功能,这些功能总体上具有较高的频率,但在每个类别的示例附近频率较低。这种趋势在培训时间,模型架构,培训示例的数量,数据增强和自我介绍的变化之间存在。我们还探索了功能频率和图像频率之间的连接,并发现光谱偏置对自然图像中普遍存在的低频敏感。在Imagenet上,我们发现学习的功能频率也随内部类别的多样性而变化,并且在更多样化的类别上具有较高的频率。我们的工作使测量并最终影响用于图像分类的神经网络的光谱行为,并且是理解为什么深层模型良好概述的一步。
translated by 谷歌翻译
执行零摄像推理时(即,在特定数据集上不进行微调)时,大型预训练的模型(例如剪辑或ALIGN)在一系列数据分布中提供一致的精度。尽管现有的微调方法显着提高了给定目标分布的准确性,但它们通常会降低分配变化的稳健性。我们通过引入一种简单有效的方法来提高鲁棒性,同时进行微调:结合零拍和微调模型(Wise-ft)的重量。与标准的微调相比,Wise-FT在分配变化下提供了巨大的准确性提高,同时保留了目标分布的高精度。在Imagenet和五个派生的分布变化上,Wise-FT在先前的工作中提高了分布转移的准确性4至6个百分点(PP),同时将Imagenet精度提高1.6pp。Wise-ft的稳健性相似(2至23 pp),明智之前与七个常用的转移学习数据集的标准微调相比,在一组进一步的分配转移的各种集合中,准确性增长率为0.8至3.3 pp。这些改进在微调或推理期间没有任何额外的计算成本。
translated by 谷歌翻译
We build new test sets for the CIFAR-10 and ImageNet datasets. Both benchmarks have been the focus of intense research for almost a decade, raising the danger of overfitting to excessively re-used test sets. By closely following the original dataset creation processes, we test to what extent current classification models generalize to new data. We evaluate a broad range of models and find accuracy drops of 3% -15% on CIFAR-10 and 11% -14% on ImageNet. However, accuracy gains on the original test sets translate to larger gains on the new test sets. Our results suggest that the accuracy drops are not caused by adaptivity, but by the models' inability to generalize to slightly "harder" images than those found in the original test sets.
translated by 谷歌翻译
Imitation learning (IL) is a simple and powerful way to use high-quality human driving data, which can be collected at scale, to identify driving preferences and produce human-like behavior. However, policies based on imitation learning alone often fail to sufficiently account for safety and reliability concerns. In this paper, we show how imitation learning combined with reinforcement learning using simple rewards can substantially improve the safety and reliability of driving policies over those learned from imitation alone. In particular, we use a combination of imitation and reinforcement learning to train a policy on over 100k miles of urban driving data, and measure its effectiveness in test scenarios grouped by different levels of collision risk. To our knowledge, this is the first application of a combined imitation and reinforcement learning approach in autonomous driving that utilizes large amounts of real-world human driving data.
translated by 谷歌翻译
Machine learning methods have seen increased application to geospatial environmental problems, such as precipitation nowcasting, haze forecasting, and crop yield prediction. However, many of the machine learning methods applied to mosquito population and disease forecasting do not inherently take into account the underlying spatial structure of the given data. In our work, we apply a spatially aware graph neural network model consisting of GraphSAGE layers to forecast the presence of West Nile virus in Illinois, to aid mosquito surveillance and abatement efforts within the state. More generally, we show that graph neural networks applied to irregularly sampled geospatial data can exceed the performance of a range of baseline methods including logistic regression, XGBoost, and fully-connected neural networks.
translated by 谷歌翻译
Electronic Health Records (EHRs) hold detailed longitudinal information about each patient's health status and general clinical history, a large portion of which is stored within the unstructured text. Temporal modelling of this medical history, which considers the sequence of events, can be used to forecast and simulate future events, estimate risk, suggest alternative diagnoses or forecast complications. While most prediction approaches use mainly structured data or a subset of single-domain forecasts and outcomes, we processed the entire free-text portion of EHRs for longitudinal modelling. We present Foresight, a novel GPT3-based pipeline that uses NER+L tools (i.e. MedCAT) to convert document text into structured, coded concepts, followed by providing probabilistic forecasts for future medical events such as disorders, medications, symptoms and interventions. Since large portions of EHR data are in text form, such an approach benefits from a granular and detailed view of a patient while introducing modest additional noise. On tests in two large UK hospitals (King's College Hospital, South London and Maudsley) and the US MIMIC-III dataset precision@10 of 0.80, 0.81 and 0.91 was achieved for forecasting the next biomedical concept. Foresight was also validated on 34 synthetic patient timelines by 5 clinicians and achieved relevancy of 97% for the top forecasted candidate disorder. Foresight can be easily trained and deployed locally as it only requires free-text data (as a minimum). As a generative model, it can simulate follow-on disorders, medications and interventions for as many steps as required. Foresight is a general-purpose model for biomedical concept modelling that can be used for real-world risk estimation, virtual trials and clinical research to study the progression of diseases, simulate interventions and counterfactuals, and for educational purposes.
translated by 谷歌翻译
Because of their close relationship with humans, non-human apes (chimpanzees, bonobos, gorillas, orangutans, and gibbons, including siamangs) are of great scientific interest. The goal of understanding their complex behavior would be greatly advanced by the ability to perform video-based pose tracking. Tracking, however, requires high-quality annotated datasets of ape photographs. Here we present OpenApePose, a new public dataset of 71,868 photographs, annotated with 16 body landmarks, of six ape species in naturalistic contexts. We show that a standard deep net (HRNet-W48) trained on ape photos can reliably track out-of-sample ape photos better than networks trained on monkeys (specifically, the OpenMonkeyPose dataset) and on humans (COCO) can. This trained network can track apes almost as well as the other networks can track their respective taxa, and models trained without one of the six ape species can track the held out species better than the monkey and human models can. Ultimately, the results of our analyses highlight the importance of large specialized databases for animal tracking systems and confirm the utility of our new ape database.
translated by 谷歌翻译
Knowledge of the symmetries of reinforcement learning (RL) systems can be used to create compressed and semantically meaningful representations of a low-level state space. We present a method of automatically detecting RL symmetries directly from raw trajectory data without requiring active control of the system. Our method generates candidate symmetries and trains a recurrent neural network (RNN) to discriminate between the original trajectories and the transformed trajectories for each candidate symmetry. The RNN discriminator's accuracy for each candidate reveals how symmetric the system is under that transformation. This information can be used to create high-level representations that are invariant to all symmetries on a dataset level and to communicate properties of the RL behavior to users. We show in experiments on two simulated RL use cases (a pusher robot and a UAV flying in wind) that our method can determine the symmetries underlying both the environment physics and the trained RL policy.
translated by 谷歌翻译